Last updated: 2021-04-16

Before we start …

  • Download the workshop folder from GitHub via RStudio.
  • Open RStudio \(>\) New Project \(>\) Version Control \(>\) Git \(>\) Paste URL \(>\) Create Project
  • https://github.com/jensroes/bayesian-mm-workshop.git
  • Exercise R scripts, data, slides
  • Create a folder called “stanout”

Outline

  • Why Bayesian and difference to (classical) frequentist approaches
  • Convergence diagnostics
  • Evaluating the posterior parameter
  • What are priors?
  • There will be exercises!

Why go Bayesian?

Two universes

  • Two different ways to generalise from a sample to the population: what do the data tell us about the population?
  • Null-hypothesis significance testing (NHST): should we believe in the data if we assume that there’s actually no effect?
  • Bayesian statistics: what’s the probability of an effect given the data (and what we know from existing research)?

Null-hypothesis significance testing (NHST)

\[ P(\text{data} \mid H_0 = \text{TRUE}) \]

  • aka frequentist statistics; p-value based statistics
  • Inference: evaluating the probability of data assuming that the null hypothesis is true.
  • If the data are extreme enough, the null hypothesis becomes implausible.
  • Conclusion: the alternative hypothesis is assumed if the null hypothesis was rejected.
  • Data are variable and parameters are fixed

NHST and its problems

  • Intuitive statistical significance is not binary but continuous.
  • How often are we actually interested in the 0 or do seriously believe it?
  • What are you doing if p-value\(=0.051\)?
  • Trade-off of effect size and sample size:
  • How plausible is a small / huge effect (e.g. mean difference, correlation)?
  • p-values vs parameter estimation (compare probability models).
  • Stopping rule
  • Inference is based on a large number of imaginary replications.
  • \(\alpha\)-level of 0.05 implies that 5% of replications will:
    • detect an effect that isn’t real
    • do not detect and effect that is real

Convergence failure for lmer models

  • Maximal random effects structure (Baayen et al., 2008; Barr et al., 2013)
  • (g)lmer fail to convergence
  • Overparametrisation: unidentifiable parameter estimates (Bates et al., 2015)
  • Solutions: reducing model complexity, changing optimisers
  • Then the reason for favouring a model becomes the failure to fit anything more complex; some sources of random errors will not be addressed.
  • Bayesian model converge by definition (as the number of iterations approaches \(\infty\)).

Why is Bayes important for linguistis (and everyone else)?

It tells us what we want to know!


Given the data (and our prior knowledge),


  • What interval contains an unobserved parameter with .95 (or some other) probability?
  • What’s the relative weight of evidence for one model (e.g., ‘null’) vs. another (e.g., an alternative)?

Why is Bayes important for linguistis (and everyone else)?

  • Inference is a function of data, existing knowledge and a probability model.
  • Evidence is summarised a probability distribution of parameter values.
  • Useful for small data sets for which prior information might play a huge role.
  • Probabilisitic sampling can handle: missing data problems, complex models (many parameters, sources of random error), ceiling / floor effects, identifyability issues.
  • Results are not dependent on ‘sampling plan’ (Kruschke, 2018).
  • Flexible ways of modeling data and deriving inference are now easily available (e.g. brms, rstanarm, rethinking).

Why is Bayes important for linguistis (and everyone else)?

Instead of


\[ p(\text{data} \mid H_0 = \text{TRUE}) \]

we can


\[ p(H \mid \text{data}) \]

Why is Bayes important for linguistis (and everyone else)?

  • Estimate credibility over parameter values based on the observed data (Kruschke, 2014).
  • Given the data, what parameter values should we believe in with most confidence.
  • For this we need to start with
  1. a probability model (e.g. a normal distribution)
  2. prior expectations as to the parameter values of this model (e.g. the mean \(\mu\), the mean difference \(\beta\), the variability in the data \(\sigma\))

Bayes’ Theorem

\[ p(\theta \mid y) \propto p(\theta) \cdot p(y \mid \theta) \]

  • \(\theta\) = parameter(s), \(y\) = data, \(\mid\) = given (conditional on), \(\propto\) = proportional to
  • \(p(y \mid \theta)\) = likelihood (probability of each data point given parameter value estimates)
  • \(p(\theta)\) = prior (what do we know about the parameter(s))
  • \(p(\theta \mid y)\) = posterior distribution (our believe in different parameter values after seeing the data)

Bayes’ Theorem

\[ p(\theta \mid y) \propto p(\theta) \cdot p(y \mid \theta) \]

  • The posterior is proportional to the prior times the likelihood.
  • Bayes’ Theorem allows us to determine probability distributions of parameter values from the data (plus prior).
  • In maximum likelihood estimation we search for the parameters that maximize the log-likelihood.
  • Bayesian estimation goes a step further and uses the likelihood to update existing beliefs in different parameter values.
  • How much we believe in different parameter values depends on the data and what we already know.

Fitting a model in brms

Bürkner (2017)

brms

  • R package for Bayesian models is no more complicated than lme4.
  • More probability models than other regressions packages:
    • gaussian, lognormal, bernoulli, poisson, zero_inflated_poisson, skew_normal, shifted_lognormal, exgaussian, et cetera
    • and allows probabilistic mixture models, nonlinear regression models, (un)equal variance signal-detection theory, multivariate models
  • Under the hood: brms creates Stan code to compile a probabilistic MCMC (Markov Chain Monte Carlo) sampler.
  • Compiling the sampler and for the sampler to obtain the full posterior can take times.
  • Check the tutorials brms author Paul-Christian Bürkner’s (Bürkner, 2017, 2018, 2019; Bürkner & Vuorre, 2019)

Simulating data

data <- jens_data_machine(intercept = 300, slope = 15)
participant_id trial_id condition y
1 1 a 146.77
1 2 b 351.40
1 3 a 253.43
1 4 b 276.41
1 5 a 364.75
1 6 b 438.49
1 7 a 352.91
1 8 b 303.78
1 9 a 299.22
1 10 b 345.09
1 11 a 272.22
1 12 b 262.51
1 13 a 449.28
1 14 b 304.86
1 15 a 206.04
1 16 b 237.94
1 17 a 367.13
1 18 b 227.32
1 19 a 192.63
1 20 b 440.75
2 1 a 238.09
2 2 b 275.00
2 3 a 264.87
2 4 b 349.95
2 5 a 150.96
2 6 b 398.78
2 7 a 327.69
2 8 b 386.37
2 9 a 326.53
2 10 b 413.74
2 11 a 130.91
2 12 b 195.03
2 13 a 298.16
2 14 b 253.52
2 15 a 307.96
2 16 b 493.99
2 17 a 192.82
2 18 b 240.48
2 19 a 298.44
2 20 b 330.29
3 1 a 394.93
3 2 b 293.61
3 3 a 348.40
3 4 b 375.91
3 5 a 268.00
3 6 b 450.01
3 7 a 377.94
3 8 b 556.62
3 9 a 202.57
3 10 b 300.41
3 11 a 493.59
3 12 b 253.92
3 13 a 244.74
3 14 b 303.64
3 15 a 317.13
3 16 b 411.92
3 17 a 196.78
3 18 b 249.36
3 19 a 397.43
3 20 b 279.52
4 1 a 513.23
4 2 b 276.32
4 3 a 236.85
4 4 b 404.06
4 5 a 556.66
4 6 b 406.02
4 7 a 264.36
4 8 b 75.78
4 9 a 103.97
4 10 b 129.18
4 11 a 408.45
4 12 b 273.95
4 13 a 252.34
4 14 b 317.80
4 15 a 422.01
4 16 b 215.61
4 17 a 308.77
4 18 b 274.98
4 19 a 346.53
4 20 b 251.17
5 1 a 401.18
5 2 b 250.72
5 3 a 411.29
5 4 b 350.52
5 5 a 234.26
5 6 b 149.76
5 7 a 281.42
5 8 b 220.08
5 9 a 402.38
5 10 b 149.35
5 11 a 251.26
5 12 b 319.77
5 13 a 351.42
5 14 b 313.78
5 15 a 255.81
5 16 b 294.27
5 17 a 573.11
5 18 b 247.92
5 19 a 326.51
5 20 b 266.11
6 1 a 266.46
6 2 b 394.30
6 3 a 388.02
6 4 b 324.39
6 5 a 280.62
6 6 b 363.95
6 7 a 230.35
6 8 b 364.55
6 9 a 283.47
6 10 b 429.16
6 11 a 255.24
6 12 b 352.45
6 13 a 355.65
6 14 b 280.48
6 15 a 350.92
6 16 b 344.30
6 17 a 88.48
6 18 b 400.45
6 19 a 413.44
6 20 b 361.68
7 1 a 308.97
7 2 b 287.49
7 3 a 366.43
7 4 b 47.38
7 5 a 347.00
7 6 b 285.35
7 7 a 202.44
7 8 b 430.84
7 9 a 231.60
7 10 b 333.74
7 11 a 92.89
7 12 b 362.26
7 13 a 481.67
7 14 b 474.93
7 15 a 314.34
7 16 b 255.93
7 17 a 288.68
7 18 b 416.72
7 19 a 393.69
7 20 b 294.19
8 1 a 308.97
8 2 b 211.06
8 3 a 117.45
8 4 b 225.80
8 5 a 261.97
8 6 b 348.78
8 7 a 55.56
8 8 b 279.14
8 9 a 323.95
8 10 b 288.41
8 11 a 385.14
8 12 b 399.57
8 13 a 208.22
8 14 b 369.05
8 15 a 372.32
8 16 b 319.12
8 17 a 358.49
8 18 b 248.22
8 19 a 338.30
8 20 b 414.36
9 1 a 249.64
9 2 b 232.00
9 3 a 284.46
9 4 b 444.94
9 5 a 399.21
9 6 b 130.52
9 7 a 452.20
9 8 b 245.22
9 9 a 287.85
9 10 b 478.32
9 11 a 383.22
9 12 b 323.17
9 13 a 73.88
9 14 b 315.64
9 15 a 549.33
9 16 b 186.68
9 17 a 315.13
9 18 b 335.46
9 19 a 79.69
9 20 b 324.29
10 1 a 213.31
10 2 b 356.67
10 3 a 199.43
10 4 b 415.68
10 5 a 239.13
10 6 b 403.43
10 7 a 243.31
10 8 b 516.38
10 9 a 255.01
10 10 b 488.39
10 11 a 364.68
10 12 b 286.22
10 13 a 316.01
10 14 b 378.38
10 15 a 424.93
10 16 b 404.71
10 17 a 254.83
10 18 b 283.76
10 19 a 159.04
10 20 b 318.36
11 1 a 393.91
11 2 b 337.13
11 3 a 265.65
11 4 b 397.23
11 5 a 264.06
11 6 b 258.92
11 7 a 393.86
11 8 b 272.36
11 9 a 328.63
11 10 b 277.08
11 11 a 234.92
11 12 b 251.31
11 13 a 297.21
11 14 b 297.72
11 15 a 315.23
11 16 b 306.55
11 17 a 288.68
11 18 b 380.01
11 19 a 338.88
11 20 b 286.28
12 1 a 209.64
12 2 b 156.19
12 3 a 242.45
12 4 b 431.52
12 5 a 176.21
12 6 b 346.58
12 7 a 301.23
12 8 b 173.45
12 9 a 251.23
12 10 b 312.07
12 11 a 151.77
12 12 b 434.67
12 13 a 431.34
12 14 b 218.28
12 15 a 204.38
12 16 b 282.83
12 17 a 352.88
12 18 b 207.85
12 19 a 349.51
12 20 b 438.64
13 1 a 306.82
13 2 b 385.90
13 3 a 524.59
13 4 b 347.52
13 5 a 369.65
13 6 b 193.95
13 7 a 469.44
13 8 b 237.60
13 9 a 447.65
13 10 b 436.41
13 11 a 271.02
13 12 b 143.91
13 13 a 189.96
13 14 b 492.73
13 15 a 356.88
13 16 b 390.18
13 17 a 181.73
13 18 b 384.14
13 19 a 550.27
13 20 b 250.58
14 1 a 409.70
14 2 b 296.46
14 3 a 333.46
14 4 b 416.39
14 5 a 33.42
14 6 b 393.73
14 7 a 187.21
14 8 b 371.13
14 9 a 383.83
14 10 b 330.04
14 11 a 429.42
14 12 b 243.23
14 13 a 162.62
14 14 b 257.31
14 15 a 462.39
14 16 b 283.78
14 17 a 408.96
14 18 b 568.04
14 19 a 134.65
14 20 b 193.81
15 1 a 343.13
15 2 b 157.29
15 3 a 384.99
15 4 b 263.16
15 5 a 346.48
15 6 b 288.23
15 7 a 417.68
15 8 b 252.17
15 9 a 410.00
15 10 b 332.70
15 11 a 248.48
15 12 b 326.27
15 13 a 348.04
15 14 b 488.50
15 15 a 397.37
15 16 b 233.27
15 17 a 446.20
15 18 b 510.11
15 19 a 322.19
15 20 b 338.80
16 1 a 314.77
16 2 b 326.74
16 3 a 106.06
16 4 b 513.62
16 5 a 375.93
16 6 b 126.64
16 7 a 377.46
16 8 b 415.35
16 9 a 409.03
16 10 b 362.12
16 11 a 333.70
16 12 b 152.71
16 13 a 484.67
16 14 b 348.73
16 15 a 285.97
16 16 b 392.87
16 17 a 445.12
16 18 b 482.27
16 19 a 276.61
16 20 b 148.15
17 1 a 314.18
17 2 b 315.14
17 3 a 388.47
17 4 b 343.69
17 5 a 179.58
17 6 b 257.48
17 7 a 378.30
17 8 b 313.38
17 9 a 392.70
17 10 b 287.33
17 11 a 376.70
17 12 b 314.43
17 13 a 259.01
17 14 b 300.75
17 15 a 250.59
17 16 b 358.74
17 17 a 329.81
17 18 b 205.95
17 19 a 303.54
17 20 b 334.52
18 1 a 282.14
18 2 b 197.14
18 3 a 315.60
18 4 b 393.66
18 5 a 252.87
18 6 b 339.29
18 7 a 365.87
18 8 b 309.92
18 9 a 316.84
18 10 b 297.70
18 11 a 342.74
18 12 b 239.63
18 13 a 289.75
18 14 b 95.18
18 15 a 232.61
18 16 b 212.50
18 17 a 322.21
18 18 b 284.04
18 19 a 265.09
18 20 b 334.59
19 1 a 363.50
19 2 b 355.08
19 3 a 239.84
19 4 b 438.52
19 5 a 294.33
19 6 b 331.43
19 7 a 281.62
19 8 b 268.15
19 9 a 195.10
19 10 b 184.32
19 11 a 304.31
19 12 b 257.80
19 13 a 479.02
19 14 b 402.81
19 15 a 228.26
19 16 b 350.27
19 17 a 411.54
19 18 b 396.51
19 19 a 404.13
19 20 b 226.71
20 1 a 353.74
20 2 b 381.69
20 3 a 300.21
20 4 b 213.98
20 5 a 301.16
20 6 b 368.09
20 7 a 247.77
20 8 b 397.42
20 9 a 254.07
20 10 b 312.14
20 11 a 441.34
20 12 b 210.43
20 13 a 264.89
20 14 b 242.05
20 15 a 586.56
20 16 b 90.10
20 17 a 143.79
20 18 b 215.34
20 19 a 234.45
20 20 b 392.53
21 1 a 325.64
21 2 b 380.57
21 3 a 321.96
21 4 b 388.48
21 5 a 285.23
21 6 b 405.54
21 7 a 450.52
21 8 b 440.75
21 9 a 308.51
21 10 b 249.65
21 11 a 167.80
21 12 b 291.84
21 13 a 268.96
21 14 b 337.52
21 15 a 293.84
21 16 b 497.97
21 17 a 357.70
21 18 b 192.67
21 19 a 144.08
21 20 b 368.95
22 1 a 380.78
22 2 b 246.06
22 3 a 206.79
22 4 b 295.82
22 5 a 319.84
22 6 b 304.14
22 7 a 323.22
22 8 b 256.93
22 9 a 101.93
22 10 b 237.52
22 11 a 290.22
22 12 b 265.50
22 13 a 287.26
22 14 b 453.07
22 15 a 228.58
22 16 b 296.65
22 17 a 154.20
22 18 b 200.48
22 19 a 189.33
22 20 b 368.47
23 1 a 154.02
23 2 b 117.88
23 3 a 397.77
23 4 b 259.66
23 5 a 403.92
23 6 b 218.79
23 7 a 500.48
23 8 b 336.08
23 9 a 235.98
23 10 b 224.19
23 11 a 339.50
23 12 b 208.78
23 13 a 359.65
23 14 b 258.42
23 15 a 172.56
23 16 b 371.75
23 17 a 276.91
23 18 b 362.48
23 19 a 409.05
23 20 b 288.75
24 1 a 276.12
24 2 b 179.87
24 3 a 305.68
24 4 b 443.12
24 5 a 46.13
24 6 b 419.62
24 7 a 472.88
24 8 b 293.56
24 9 a 148.58
24 10 b 246.83
24 11 a 141.94
24 12 b 317.71
24 13 a 295.87
24 14 b 138.86
24 15 a 430.96
24 16 b 166.25
24 17 a 314.58
24 18 b 100.03
24 19 a 394.47
24 20 b 223.26
25 1 a 256.81
25 2 b 406.70
25 3 a 339.99
25 4 b 433.10
25 5 a 310.48
25 6 b 331.62
25 7 a 287.20
25 8 b 417.24
25 9 a 55.10
25 10 b 440.60
25 11 a 393.72
25 12 b 306.49
25 13 a 313.41
25 14 b 171.30
25 15 a 283.85
25 16 b 403.39
25 17 a 377.15
25 18 b 253.58
25 19 a 368.62
25 20 b 382.25
26 1 a 324.98
26 2 b 290.82
26 3 a 233.33
26 4 b 241.61
26 5 a 233.44
26 6 b 127.50
26 7 a 260.47
26 8 b 338.76
26 9 a 432.16
26 10 b 201.06
26 11 a 222.65
26 12 b 301.64
26 13 a 134.29
26 14 b 189.79
26 15 a 163.20
26 16 b 266.31
26 17 a 150.63
26 18 b 342.65
26 19 a 207.33
26 20 b 291.52
27 1 a 258.74
27 2 b 331.57
27 3 a 257.16
27 4 b 376.43
27 5 a 375.94
27 6 b 254.52
27 7 a 137.18
27 8 b 393.34
27 9 a 284.06
27 10 b 442.80
27 11 a 409.05
27 12 b 183.47
27 13 a 362.75
27 14 b 549.06
27 15 a 224.39
27 16 b 213.77
27 17 a 261.17
27 18 b 289.65
27 19 a 191.89
27 20 b 357.12
28 1 a 412.47
28 2 b 479.24
28 3 a 348.89
28 4 b 354.61
28 5 a 233.40
28 6 b 405.19
28 7 a 310.65
28 8 b 342.70
28 9 a 163.27
28 10 b 261.94
28 11 a 383.76
28 12 b 199.95
28 13 a 193.28
28 14 b 296.08
28 15 a 234.03
28 16 b 443.38
28 17 a 418.35
28 18 b 156.00
28 19 a 253.67
28 20 b 431.98
29 1 a 411.11
29 2 b 537.51
29 3 a 241.73
29 4 b 373.19
29 5 a 210.55
29 6 b 261.46
29 7 a 343.86
29 8 b 324.68
29 9 a 244.01
29 10 b 317.12
29 11 a 315.65
29 12 b 345.90
29 13 a 110.90
29 14 b 528.97
29 15 a 413.21
29 16 b 145.94
29 17 a 397.77
29 18 b 362.10
29 19 a 278.63
29 20 b 270.61
30 1 a 211.62
30 2 b 257.03
30 3 a 297.41
30 4 b 549.76
30 5 a 214.67
30 6 b 279.44
30 7 a 464.34
30 8 b 395.51
30 9 a 161.42
30 10 b 337.78
30 11 a 32.47
30 12 b 396.03
30 13 a 361.29
30 14 b 398.22
30 15 a 248.02
30 16 b 432.86
30 17 a 285.53
30 18 b 309.88
30 19 a 322.52
30 20 b 500.40

Simulating data

data <- jens_data_machine(intercept = 300, slope = 15)

  • You will use the same “data” in the exercises.
  • Think about it as keystroke intervals :)
  • Parameter of interest: difference between conditions.
  • After accounting for variability in the data associated with the sampling process.
  • Mixed-effects model combine fixed and random effects.

Simulating data

data <- jens_data_machine(intercept = 300, slope = 15)

Formulating the probability model

\[ y_i \sim N(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \text{condition}_i\cdot\beta_1 + u_i\\ \]

  • Outcome variable \(y\)
  • Tilde symbol (\(\sim\)): “is distributed according to”
  • Equal sign (\(=\)): deterministic relationship
  • \(N\): normal distribution with a mean \(\mu\) and a residual (error) variance \(\sigma^2\)
  • Subscript \(i\): every observation in \(1 \dots N\).
  • Greek symbols: unknown population parameter values (caret indicates estimated, e.g. \(\hat{\beta}\))

Formulating the model

\[ y_i \sim N(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \text{condition}_i\cdot\beta_1 + u_i\\ \]

  • Treatment contrast: condition a = 0, condition b = 1
contrasts(factor(data$condition)) # default contrast
  b
a 0
b 1
  • Given the data \(y\) and this model, we can estimate \(\beta_0\) and \(\beta_1\).
  • For a Bayesian model we still need priors (end of session).
  • Intercept \(\beta_0\): estimated mean of condition a (you see why?)
  • Slope \(\beta_1\): difference between condition a and condition b
  • \(u\) is participant variance with \(N(0,\sigma_u)\)

R syntax

  • Frequentist mixed-effects models
library(lme4)
fit_lmer <- lmer(y ~ 1 + condition + (1|participant_id), data = data)
  • Bayesian mixed-effects models
library(brms)
fit_brm <- brm(y ~ 1 + condition + (1|participant_id), data = data)

Exercise: fit models on simulated data

  • Complete and run scripts lmer_sim.R and brms_sim.R
  • Replace --- correctly; run the script carefully line by line.
  • Check the output in the console.
  • Inspect and compare the fixed effects summaries.

Fit models on simulated data

data <- jens_data_machine(intercept = 300, slope = 15)
fit_lmer <- lmer(y ~ 1 + condition + (1|participant_id), data = data)
fit_brm <- brm(y ~ 1 + condition + (1|participant_id), data = data)

Fit models on simulated data

data <- jens_data_machine(intercept = 300, slope = 15)
coef(summary(fit_lmer)) # Coefficients of frequentist model
             Estimate Std. Error   t value
(Intercept) 300.26567   5.789763 51.861484
conditionb   15.59151   8.136926  1.916143
fixef(fit_brm) # Coefficients of Bayesian model
            Estimate Est.Error        Q2.5     Q97.5
Intercept  300.16663  5.988561 288.8796244 311.79417
conditionb  15.76034  8.047246  -0.0991772  31.46148

Convergence diagnostics

So what about this?

  • Progress of probabilistic sampler.
  • 2,000 iterations for parameter estimation per chain
  • iterations / 2 warm-up samples: discarded eventually
  • 4 chains to establish convergence
  • Change cores (check parallel::detectCores()) for running chains in parallel.
  • How many posterior samples have we got for our inference (= chains * (iterations - warm-up))?
  • How many iterations / chains do you need?

Parameter estimation

  • Hamiltonian Monte Carlo: Markov chain Monte Carlo method for obtaining random samples.
  • Converge to being distributed according to a target probability distribution.
  • Probability of values proposed by the sampler is estimated given the data and the prior.
  • Direction changes if probability of proposed estimate is lower than previous one.
  • Priors can be used to limit the parameter space (which is otherwise infinite) but are updated at every step.
  • Parameter space is really multi-dimensional.

Parameter estimation

  • Hamiltonian Monte Carlo: Markov chain Monte Carlo method for obtaining random samples.
  • Converge to being distributed according to a target probability distribution.
  • Probability of values proposed by the sampler is estimated given the data and the prior.
  • Direction changes if probability of proposed estimate is lower than previous one.
  • Priors can be used to limit the parameter space (which is otherwise infinite) but are updated at every step.
  • Parameter space is really multi-dimensional.

Parameter estimation

  • Hamiltonian Monte Carlo: Markov chain Monte Carlo method for obtaining random samples.
  • Converge to being distributed according to a target probability distribution.
  • Probability of values proposed by the sampler is estimated given the data and the prior.
  • Direction changes if probability of proposed estimate is lower than previous one.
  • Priors can be used to limit the parameter space (which is otherwise infinite) but are updated at every step.
  • Parameter space is really multi-dimensional.

Parameter estimation

Parameter estimation

Parameter estimation

Parameter estimation

Parameter estimation

Parameter estimation

Parameter estimation

Exercise: model diagnostics

  • \(\hat{R}\) (R-hat) convergence statistic; should be \(<1.1\) (Gelman & Rubin, 1992)
rhat(fit_brm, pars = c("b_Intercept", "b_conditionb")) 
  • Traceplots: we want fat hairy caterpillars
plot(fit_brm, pars = c("b_Intercept", "b_conditionb"))
  • Posterior predictive checks: compare observed data \(y\) and model predictions \(y_{rep}\)
pp_check(fit_brm, nsamples = 100)
  • Replace the ---s in scripts model_diagnostics.R and run the code (line by line).

Posterior probability distribution

Posterior probability distribution

  • Get posterior of slope \(\beta\) (difference between conditions)
beta <- posterior_samples(fit_brm, pars = "b_conditionb") %>% pull()
length(beta)
[1] 4000
beta[1:5]
[1] 26.169134 20.234850  9.773276 21.438894 12.943398

Posterior probability distribution

Posterior probability distribution

  • Posterior mean
mean(beta)
[1] 15.76034

Posterior probability distribution

  • 95% probability interval (conceptually different from confidence interval)
quantile(beta, probs = c(0.025, 0.975))
      2.5%      97.5% 
-0.0991772 31.4614818 
  • 89% probability interval (McElreath, 2016)
quantile(beta, probs = c(0.055, 0.945))
     5.5%     94.5% 
 2.804782 28.344642 

Posterior probability distribution

  • 95% probability interval (conceptually different from confidence interval)
quantile(beta, probs = c(0.025, 0.975))
      2.5%      97.5% 
-0.0991772 31.4614818 
  • 89% probability interval (McElreath, 2016)
quantile(beta, probs = c(0.055, 0.945))
     5.5%     94.5% 
 2.804782 28.344642 

Posterior probability distribution

  • Probability that slope \(\beta\) is negative: \(P(\hat{\beta}<0)\)
mean(beta < 0)
[1] 0.02775
  • Probability that slope \(\beta\) is smaller than 10: \(P(\hat{\beta}<10)\)
mean(beta < 10)
[1] 0.23375

Posterior probability distribution

  • Probability that slope \(\beta\) is negative: \(P(\hat{\beta}<0)\)
mean(beta < 0)
[1] 0.02775
  • Probability that slope \(\beta\) is smaller than 10: \(P(\hat{\beta}<10)\)
mean(beta < 10)
[1] 0.23375

Exercise: parameter evaluation

  • Open the script parameter_evaluation.R
  • Work through each line of code: complete the missing bits (---).
  • Check out the output.

What are priors?

chapter 7 in Lee & Wagenmakers (2014)

Priors

\[ \text{posterior} \propto \text{prior} \cdot \text{likelihood} \]

  • Help parameter estimation by limiting the parameter space.
  • Prior knowledge about plausible parameter values: where does the sampler has to search for the target distribution.
  • This knowledge is expressed as probability distributions (e.g. normal distributions).
  • Small data samples are sensitive to prior information which makes intuitively sense.
  • Otherwise data typically overcome the prior (automatic Ockham’s razor).
  • Less common: test data against a (prior) effect suggested by the literature.
  • We know more than we sometimes think!

Priors: intercept

  • A priori, each value in the parameter space is equally possible.
  • We want to help our sampler by constraining the parameter space!
  • Let’s think about the parameter space for models of keystroke data.

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?
  • How long is the average pause before a sentence?

\[ \text{pre-sentence pause} \sim N(???, ???) \]

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?
  • How long is the average pause before a sentence?

\[ \text{pre-sentence pause} \sim N(1000 \text{ msecs}, ???) \]

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?
  • How long is the average pause before a sentence?
  • How short / long can this be?
  • Plausible probability distribution for pre-sentence pauses:

\[ \text{pre-sentence pause} \sim N(1000 \text{ msecs}, ???) \]

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?
  • How long is the average pause before a sentence?
  • How short / long can this be?
  • Plausible probability distribution for pre-sentence pauses:

\[ \text{pre-sentence pause} \sim N(1000 \text{ msecs}, 500\text{ msecs}) \]

Priors: intercept

  • Intercepts are some form of average (depending on contrast coding).
  • Can keystroke intervals range between -\(\infty\) and \(\infty\)?
  • What the lower and upper end?
  • How long is the average pause before a sentence?
  • How short / long can this be?
  • Plausible probability distribution for pre-sentence pauses:

\[ \text{pre-sentence pause} \sim N(1000 \text{ msecs}, 500\text{ msecs}) \]

Priors: slope

  • Say, we compare people writing in L1 and L2.
  • Writing in L2 can be harder because of, e.g., lexical retrieval.
  • Hence, pre-word / pre-sentence keystroke intervals are sometimes longer.
  • What is a plausible prior for this delay?

Priors: slope

  • Say, we compare people writing in L1 and L2.
  • Writing in L2 can be harder because of, e.g., lexical retrieval.
  • Hence, pre-word / pre-sentence keystroke intervals are sometimes longer.
  • What is a plausible prior for this delay?

\[ \text{slowdown for L2s} \sim N(???, ???) \]

Priors: slope

  • Say, we compare people writing in L1 and L2.
  • Writing in L2 can be harder because of, e.g., lexical retrieval.
  • Hence, pre-word / pre-sentence keystroke intervals are sometimes longer.
  • What is a plausible prior for this delay?

\[ \text{slowdown for L2s} \sim N(250 \text{ msecs}, 100\text{ msecs}) \]

Priors: slope

  • Say, we compare people writing in L1 and L2.
  • Writing in L2 can be harder because of, e.g., lexical retrieval.
  • Hence, pre-word / pre-sentence keystroke intervals are sometimes longer.
  • What is a plausible prior for this delay?

\[ \text{slowdown for L2s} \sim N(250 \text{ msecs}, 100\text{ msecs}) \]

Priors: slope

  • We often don’t know what the slope might be or can’t make strong predictions about the effects.
  • In fact, this might be why we run a statistical model.
  • However, we still have an intuition what is plausible and what isn’t.
  • Say, words with alternative spellings (accordian, accordion) may or may not lead to longer pauses (than words with less possible spellings; aspergus).
  • We are not sure, so let’s use a mean of 0 msecs; what would be a possible \(\sigma\)?

Priors: slope

  • We often don’t know what the slope might be or can’t make strong predictions about the effects.
  • In fact, this might be why we run a statistical model.
  • However, we still have an intuition what is plausible and what isn’t.
  • Say, words with alternative spellings (accordian, accordion) may or may not lead to longer pauses (than words with less possible spellings; aspergus).
  • We are not sure, so let’s use a mean of 0 msecs; what would be a possible sd?

\[ \text{slope} \sim N(0 \text{ msecs}, ???) \]

Priors: slope

  • We often don’t know what the slope might be or can’t make strong predictions about the effects.
  • In fact, this might be why we run a statistical model.
  • However, we still have an intuition what is plausible and what isn’t.
  • Say, words with alternative spellings (accordian, accordion) may or may not lead to longer pauses (than words with less possible spellings; aspergus).
  • We are not sure, so let’s use a mean of 0 msecs; what would be a possible sd?

\[ \text{slope} \sim N(0 \text{ msecs}, 100\text{ msecs}) \]

  • We could even truncate the prior if we believe that the value must be smaller / larger than a certain value.

Priors: slope

  • We often don’t know what the slope might be or can’t make strong predictions about the effects.
  • In fact, this might be why we run a statistical model.
  • However, we still have an intuition what is plausible and what isn’t.
  • Say, words with alternative spellings (accordian, accordion) may or may not lead to longer pauses (than words with less possible spellings; aspergus).
  • We are not sure, so let’s use a mean of 0 msecs; what would be a possible sd?

\[ \text{slope} \sim N(0 \text{ msecs}, 100\text{ msecs}) \]

  • We could even truncate the prior if we believe that the value must be smaller / larger than a certain value.

Priors

  • Even weakly informative priors are helpful for estimating parameter values.
  • They help to constrain the parameter space.
  • The parameter space should reflect what we think is plausible.
  • This is important for more complex models (mixtures, Wiener diffusion models).

Priors

Check defaults used last session:


fit_brm <- readRDS(file = "../stanout/brms_sim.rda")
prior_summary(fit_brm)
prior class coef group
(flat) b
(flat) b conditionb
student_t(3, 308.2, 98.7) Intercept
student_t(3, 0, 98.7) sd
student_t(3, 0, 98.7) sd participant_id
student_t(3, 0, 98.7) sd Intercept participant_id
student_t(3, 0, 98.7) sigma

Student t-distribution

  • Symmetric continuous distribution with fat tails assigning more probability to extreme values.
  • \(\nu\) parameter controls the degrees of freedom
  • \(\nu = 1\) is a Cauchy distribution
  • When \(\nu \rightarrow \infty\) the t-distribution becomes Gaussian.

Model specification

Instead of


fit <- brm(outcome ~ predictor + (1|participant), data = data)


do …


# Create model
model <- bf(outcome ~ predictor + (1|participant))
# specifying model formula outside of brms works for lmer too.
# bf = brmsformula

# Fit brms
fit <- brm(model, data = data)

Prior specification

# Create model
model <- bf(outcome ~ predictor + (1|participant))

# Look at priors: some have reasonable defaults, others are flat.
get_prior(model, data = data)

# Specify priors
prior <- set_prior("normal(1000, 500)", class = "Intercept", lb = 0, ub = 100000) +
         set_prior("normal(0, 100)", class = "b") # for slope(s)

# Fit brms
fit <- brm(model, data = data, prior = prior)

Exercise: priors

  • Adding priors to last session’s model in brms_sim_with_prior.R; replace the ---s according to the comments in the script.
  • If you have time, make the standard deviation of the prior for slope \(b\) either much larger or smaller and check how this affects the posterior estimate.

Prior simulation

Prior simulation

Prior simulation

Prior simulation

The end

For next session

  • We will compare different models of keystroke intervals (same data; 5 different probability models).
  • Run the following models and save their posterior:
  1. gaussian_sentence.R
  2. lognormal_sentence.R
  3. exgaussian_sentence.R
  4. mixture_model_sentence.R
  5. skewnormal_sentence.R (spoiler: don’t run this one, if you don’t have time)
  • We need their posteriors for the next session.
  • Don’t start this last minute :)

References

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.

Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv Preprint arXiv:1506.04967.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017

Bürkner, P.-C. (2019). Bayesian item response modeling in R with brms and Stan. arXiv Preprint arXiv:1905.09501.

Bürkner, P.-C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1), 77–101.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan.

Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270–280.

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.

McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press.